Last updated: 2025-06-20
Checks: 6 1
Knit directory: casper_ss_ma/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(12345) was run prior to running the
code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
| absolute | relative |
|---|---|
| /Volumes/scratch/DIMA/piva/casper_ss_ma/ | .. |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version f0e862c. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .RData
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/figure/
Untracked files:
Untracked: .DS_Store
Untracked: analysis/.DS_Store
Untracked: analysis/02_degs_go_aneuploidy_median.Rmd
Untracked: analysis/03_degs_go_CD82expr_median.Rmd
Untracked: analysis/VennDiagram.2025-06-09_13-53-40.335615.log
Untracked: analysis/VennDiagram.2025-06-09_13-54-51.029086.log
Untracked: analysis/VennDiagram.2025-06-09_13-55-15.147126.log
Untracked: analysis/VennDiagram.2025-06-09_13-56-18.122749.log
Untracked: analysis/VennDiagram.2025-06-09_13-56-30.934079.log
Untracked: analysis/VennDiagram.2025-06-09_14-18-19.412377.log
Untracked: analysis/VennDiagram.2025-06-18_10-28-53.699452.log
Untracked: analysis/VennDiagram.2025-06-18_10-37-36.77178.log
Untracked: analysis/VennDiagram.2025-06-18_11-32-36.228427.log
Untracked: analysis/VennDiagram.2025-06-18_15-38-55.387683.log
Untracked: analysis/VennDiagram.2025-06-18_15-48-17.579371.log
Untracked: analysis/VennDiagram.2025-06-18_17-18-17.268774.log
Untracked: analysis/VennDiagram.2025-06-19_11-11-17.376961.log
Untracked: analysis/VennDiagram.2025-06-19_14-52-46.049026.log
Untracked: analysis/VennDiagram.2025-06-19_16-40-05.861139.log
Untracked: analysis/VennDiagram.2025-06-19_16-40-07.33202.log
Untracked: analysis/VennDiagram.2025-06-19_16-40-08.673023.log
Untracked: analysis/VennDiagram.2025-06-19_17-50-05.238063.log
Untracked: analysis/VennDiagram.2025-06-19_17-50-07.22979.log
Untracked: analysis/VennDiagram.2025-06-19_17-50-09.007028.log
Untracked: analysis/VennDiagram.2025-06-19_18-48-01.885712.log
Untracked: analysis/VennDiagram.2025-06-19_18-48-03.579702.log
Untracked: analysis/VennDiagram.2025-06-19_18-48-04.898695.log
Untracked: analysis/VennDiagram.2025-06-20_10-18-23.300456.log
Untracked: analysis/VennDiagram.2025-06-20_10-18-24.588109.log
Untracked: analysis/VennDiagram.2025-06-20_10-18-26.077856.log
Untracked: analysis/VennDiagram.2025-06-20_10-50-54.081682.log
Untracked: analysis/VennDiagram.2025-06-20_10-50-55.516535.log
Untracked: analysis/VennDiagram.2025-06-20_10-50-56.913582.log
Untracked: analysis/VennDiagram.2025-06-20_11-10-43.68944.log
Untracked: analysis/VennDiagram.2025-06-20_11-10-45.681514.log
Untracked: analysis/VennDiagram.2025-06-20_11-10-47.126222.log
Untracked: analysis/hsa04064.HLT-HighAS_vs_HLT-LowAS.png
Untracked: analysis/hsa04064.HLT-HighCD82_vs_HLT-LowCD82.png
Untracked: analysis/hsa04064.HRplus-HighAS_vs_HRplus-LowAS.png
Untracked: analysis/hsa04064.HRplus-HighCD82_vs_HRplus-LowCD82.png
Untracked: analysis/hsa04064.HRplus_vs_HLT.png
Untracked: analysis/hsa04064.TNBC-HighAS_vs_TNBC-LowAS.png
Untracked: analysis/hsa04064.TNBC-HighCD82_vs_TNBC-LowCD82.png
Untracked: analysis/hsa04064.TNBC_vs_HLT.png
Untracked: analysis/hsa04064.TNBC_vs_HRplus.png
Untracked: analysis/hsa04064.png
Untracked: analysis/hsa04064.xml
Untracked: code/
Untracked: data/
Untracked: degs_HLT-HighAS_vs_HLT-LowAS.csv
Untracked: degs_HLT-HighCD82_vs_HLT-LowCD82.csv
Untracked: degs_HRplus-HighAS_vs_HRplus-LowAS.csv
Untracked: degs_TNBC-HighCD82_vs_TNBC-LowCD82.csv
Untracked: output/
Unstaged changes:
Modified: analysis/00_casper_analysis.Rmd
Deleted: analysis/02_deconvolution.Rmd
Modified: analysis/index.Rmd
Modified: casper_ss_ma.Rproj
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/01_degs_go.Rmd) and HTML
(docs/01_degs_go.html) files. If you’ve configured a remote
Git repository (see ?wflow_git_remote), click on the
hyperlinks in the table below to view the files as they were in that
past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | f0e862c | annamariapiva | 2025-06-20 | new reports |
The goal of this analysis is to identify which pathways are up- or down-regulated in each condition (Healthy, HR+, and TNBC). The following comparisons have been performed:
HR+ vs Healthy
TNBC vs Healthy
TNBC vs HR+
The analysis includes:
The input for the following analysis is:
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)
The first steps to start the analysis in R is to load the packages required for the analysis, load the input data mentioned above and establish the thresholds for the analysis:

Let’s have a look at PCA, and gene expression pattern across samples. The batch effect has been considered in the design.
To evaluate the similarity between RNA-seq samples, we computed both Pearson correlation and Euclidean distance using variance-stabilized expression data:
Pearson correlation (cor()): Measures the degree to which gene expression profiles across samples vary in a similar pattern. Values range from –1 (inverse relationship) to +1 (perfect similarity). High correlations indicate that samples share similar expression trends.
Euclidean distance (dist()): Quantifies the overall dissimilarity in expression profiles between samples, based on their absolute expression values. Smaller distances indicate more similar samples.
These metrics provide complementary views: correlation focuses on shared expression patterns (direction), while distance captures overall differences in magnitude. Both are useful for identifying sample clusters, detecting outliers, and validating experimental reproducibility.


Differential expression analysis is performed using a custom function, which accounts for batch effect. A batch effect occurs when non-biological factors, like laboratory conditions or instruments used, in an experiment cause changes in the data produced by the experiment. Lowly expressed genes are removed to reduce noise. Lowly expressed genes are here considered as:
Genes are annotated as significant or not, to distinguish between genes showing meaningful changes, that is having an adjusted p-value below the threshold considered above and an absolute log2FoldChange greater than the cutoff considered above.
Given the significant genes, among the differentially expressed genes previously computed, let’s visualize the top20 and all the DE genes.
Meaning of Colors

Further analysis is done through gene set enrichment analysis, which does not exclude genes based on logfc or adjusted p-value, as done previously. GSEA is performed separately on each subontology: biological processes (BP), cellular components (CC) and molecular functions (MF). The dot plot below shows the top 10 most enriched GO terms. The size of each dot correlates with the count of differentially expressed genes associated with each GO term. Furthermore, the color of each dot reflects the significance of the enrichment of the respective GO term, highlighting its relative importance.




quartz_off_screen
2
To visualize gene expression changes on biological pathways, we used the pathview R package, which maps gene-level statistics (e.g., log2 fold-changes) onto KEGG pathway diagrams.
For each contrast in our differential expression analysis, we extracted significantly differentially expressed genes and passed their log2 fold-change values to pathview() to visualize the NF-kappa B signaling pathway (KEGG pathway ID “hsa04064”). Pathway visualizations highlight upregulated and downregulated genes in red and blue, respectively, based on log2 fold-change.

Genes are annotated as significant or not, to distinguish between genes showing meaningful changes, that is having an adjusted p-value below the threshold considered above and an absolute log2FoldChange greater than the cutoff considered above.
Given the significant genes, among the differentially expressed genes previously computed, let’s visualize the top20 and all the DE genes.
Meaning of Colors





quartz_off_screen
2
Genes are annotated as significant or not, to distinguish between genes showing meaningful changes, that is having an adjusted p-value below the threshold considered above and an absolute log2FoldChange greater than the cutoff considered above.
Given the significant genes, among the differentially expressed genes previously computed, let’s visualize the top20 and all the DE genes.
Meaning of Colors





quartz_off_screen
2



